-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
get_effective_cell for getting the contents of Excel cell when the cell is merged #4673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…this allows setting formatting_info=True (GH4438)
…e cell is merged (GH4672)
is this just a useful method? (e.g. you are not using it anywhere) tests? |
@jreback cancan101@e82bfa4 Has tests for the new method. |
@cancan101 oh..ok can you elaborate on the utility of this though? it SEEMS useful...but what is the actual usecase? |
Sure. There are plenty of Excel documents where there are multiple columns but some columns share one header cell. A given header cell will apply to multiple columns (think The classic example that I see is in SEC filing where you end up with:
|
@jreback The idea being that a given cell acts as a header for multiple columns |
Can you add a docstring to the method to explain this. To me what you're saying seems a bit magical, so example in docstring would be good. Also, why the name "effective" cell, is this standard? |
@hayd I can definitely write a docstring. If you have something better than effective, I can certainly change to that. |
Not sure tbh, but there may be an Excel (or more general) term for this. If so we should use it (I'm still not 100% what this is, so I don' think I'm the best person to google it to see if the term exists :) ). |
I'd like to close this in favor of a more general way of handling merged cells [at least at the top of columns]. E.g. if you had something like this in the header:
This would become a set of columns that are a If any of this already exists, then that's great. |
and pandas might use something like this function to determine if there are merged cells and, if so, how they should be handled. |
@jtratner I like your suggestion of converting the headings into a FWIW, I did find this forum about the issue for Excel files: http://answers.microsoft.com/en-us/office/forum/office_2007-excel/unmerge-cells-and-copy-the-content-in-each/49f46676-e318-4d33-8cac-7c6302214534 |
http://pandas.pydata.org/pandas-docs/dev/io.html#reading-columns-with-a-multiindex already for csv, basically |
@jreback What about HTML? |
I think |
The HTML that I am thinking about is regular enough. The rows all have the same number of "columns" and the html uses colspan where the Excel uses merged cells. See for example: http://www.sec.gov/Archives/edgar/data/47217/000104746913006802/a2215416z10-q.htm#CCSCI |
I take it back, I forgot that |
@jreback what is an "mi" ? |
multi-index |
@cancan101 yeah, I agree, it's very regular and basically equivalent [i.e., here's a set of cells, some of them have widths and heights]. If you convert to repeating, it's exactly the same as what csv reader would do with them. |
@cancan101 and if it can't make sense of it [i.e., doesn't seem regular], could just fail quickly and make the user munge themselves. |
Closes #4672